Huge News!Announcing our $40M Series B led by Abstract Ventures.Learn More →

html-encoding-sniffer

Package Overview

Dependencies

Advanced tools

Install Socket

Detect and block malicious and high-risk dependencies

Install

html-encoding-sniffer

Sniff the encoding from a HTML byte stream

4.0.0
latest
Source
npm

Version published: 12 months ago

Weekly downloads: 22M; decreased by-9.58%

Maintainers: 6

Weekly downloads

Created: 8 years ago

What is html-encoding-sniffer?

The html-encoding-sniffer npm package is designed to determine the encoding of HTML documents. It does this by examining the byte stream of the document, looking for any encoding declarations in the form of a meta tag or an HTTP header. This is particularly useful for applications that need to correctly interpret or display HTML content from various sources, ensuring that text is properly encoded and displayed.

What are html-encoding-sniffer's main functionalities?

Sniffing HTML encoding from HTTP headers

This feature allows you to determine the encoding of an HTML document by examining the HTTP headers. The 'transportLayerEncodingLabel' option is used to specify the encoding declared in the HTTP headers.

"use strict";
const htmlEncodingSniffer = require('html-encoding-sniffer');
const encoding = htmlEncodingSniffer(byteStream, { transportLayerEncodingLabel: 'utf-8' });

Sniffing HTML encoding from a meta tag

This feature enables the detection of the document's encoding by looking for a meta tag within the HTML that specifies the encoding. The 'defaultEncoding' option allows you to specify a fallback encoding in case no encoding is declared in the document.

"use strict";
const htmlEncodingSniffer = require('html-encoding-sniffer');
const encoding = htmlEncodingSniffer(byteStream, { defaultEncoding: 'windows-1252' });

Other packages similar to html-encoding-sniffer

Determine the Encoding of a HTML Byte Stream

This package implements the HTML Standard's encoding sniffing algorithm in all its glory. The most interesting part of this is how it pre-scans the first 1024 bytes in order to search for certain <meta charset>-related patterns.

const htmlEncodingSniffer = require("html-encoding-sniffer");
const fs = require("fs");

const htmlBytes = fs.readFileSync("./html-page.html");
const sniffedEncoding = htmlEncodingSniffer(htmlBytes);

The passed bytes are given as a Uint8Array; the Node.js Buffer subclass of Uint8Array will also work, as shown above.

The returned value will be a canonical encoding name (not a label). You might then combine this with the whatwg-encoding package to decode the result:

const whatwgEncoding = require("whatwg-encoding");
const htmlString = whatwgEncoding.decode(htmlBytes, sniffedEncoding);

Options

You can pass two potential options to htmlEncodingSniffer:

const sniffedEncoding = htmlEncodingSniffer(htmlBytes, {
  transportLayerEncodingLabel,
  defaultEncoding
});

These represent two possible inputs into the encoding sniffing algorithm:

transportLayerEncodingLabel is an encoding label that is obtained from the "transport layer" (probably a HTTP Content-Type header), which overrides everything but a BOM.
defaultEncoding is the ultimate fallback encoding used if no valid encoding is supplied by the transport layer, and no encoding is sniffed from the bytes. It defaults to "windows-1252", as recommended by the algorithm's table of suggested defaults for "All other locales" (including the en locale).

Credits

This package was originally based on the excellent work of @nicolashenry, in jsdom. It has since been pulled out into this separate package.

Keywords

FAQs

What is html-encoding-sniffer?

Is html-encoding-sniffer popular?

Is html-encoding-sniffer well maintained?

Package last updated on 12 Nov 2023

Did you know?

Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.

Install

html-encoding-sniffer

What is html-encoding-sniffer?

What are html-encoding-sniffer's main functionalities?

Other packages similar to html-encoding-sniffer

iconv-lite

jschardet

Determine the Encoding of a HTML Byte Stream

Options

Credits

Keywords

Related posts

Node.js Implements Stricter Policies for Semver-Major Pull Requests Ahead of Release Deadlines

Roblox Developers Targeted with npm Packages Infected with Skuld Infostealer and Blank Grabber

vlt Debuts New JavaScript Package Manager and Serverless Registry at NodeConf EU